22 research outputs found

    Fast and Interpretable Nonlocal Neural Networks for Image Denoising via Group-Sparse Convolutional Dictionary Learning

    Full text link
    Nonlocal self-similarity within natural images has become an increasingly popular prior in deep-learning models. Despite their successful image restoration performance, such models remain largely uninterpretable due to their black-box construction. Our previous studies have shown that interpretable construction of a fully convolutional denoiser (CDLNet), with performance on par with state-of-the-art black-box counterparts, is achievable by unrolling a dictionary learning algorithm. In this manuscript, we seek an interpretable construction of a convolutional network with a nonlocal self-similarity prior that performs on par with black-box nonlocal models. We show that such an architecture can be effectively achieved by upgrading the â„“1\ell 1 sparsity prior of CDLNet to a weighted group-sparsity prior. From this formulation, we propose a novel sliding-window nonlocal operation, enabled by sparse array arithmetic. In addition to competitive performance with black-box nonlocal DNNs, we demonstrate the proposed sliding-window sparse attention enables inference speeds greater than an order of magnitude faster than its competitors.Comment: 11 pages, 8 figures, 6 table

    Lateralization in the dichotic listening of tones is influenced by the content of speech

    Get PDF
    Available online 10 February 2020.Cognitive functions, for example speech processing, are distributed asymmetrically in the two hemispheres that mostly have homologous anatomical structures. Dichotic listening is a well-established paradigm to investigate hemispherical lateralization of speech. However, the mixed results of dichotic listening, especially when using tonal languages as stimuli, complicates the investigation of functional lateralization. We hypothesized that the inconsistent results in dichotic listening are due to an interaction in processing a mixture of acoustic and linguistic attributes that are differentially processed over the two hemispheres. In this study, a within-subject dichotic listening paradigm was designed, in which different levels of speech and linguistic information was incrementally included in different conditions that required the same tone identification task. A left ear advantage (LEA), in contrast with the commonly found right ear advantage (REA) in dichotic listening, was observed in the hummed tones condition, where only the slow frequency modulation of tones was included. However, when phonemic and lexical information was added in simple vowel tone conditions, the LEA became unstable. Furthermore, ear preference became balanced when phonological and lexicalsemantic attributes were included in the consonant-vowel (CV), pseudo-word, and word conditions. Compared with the existing REA results that use complex vowel word tones, a complete pattern emerged gradually shifting from LEA to REA. These results support the hypothesis that an acoustic analysis of suprasegmental information of tones is preferably processed in the right hemisphere, but is influenced by phonological and lexical semantic processes residing in the left hemisphere. The ear preference in dichotic listening depends on the levels of speech and linguistic analysis and preferentially lateralizes across the different hemispheres. That is, the manifestation of functional lateralization depends on the integration of information across the two hemispheres.This study was supported by National Natural Science Foundation of China 31871131, Major Program of Science and Technology Commission of Shanghai Municipality (STCSM) 17JC1404104, Program of Introducing Talents of Discipline to Universities, Base B16018 to XT, and the JRI Seed Grants for Research Collaboration from NYU-ECNU Institute of Brain and Cognitive Science at NYU Shanghai to XT and QC, and NIH 2R01DC05660 to David Poeppel at New York University supporting NM and AF and F32 DC011985 to AF

    Neural correlates of sign language production revealed by electrocorticography

    No full text
    Objective: The combined spatiotemporal dynamics underlying sign language production remain largely unknown. To investigate these dynamics compared to speech production, we used intracranial electrocorticography during a battery of language tasks. Methods: We report a unique case of direct cortical surface recordings obtained from a neurosurgical patient with intact hearing who is bilingual in English and American Sign Language. We designed a battery of cognitive tasks to capture multiple modalities of language processing and production. Results: We identified 2 spatially distinct cortical networks: ventral for speech and dorsal for sign production. Sign production recruited perirolandic, parietal, and posterior temporal regions, while speech production recruited frontal, perisylvian, and perirolandic regions. Electrical cortical stimulation confirmed this spatial segregation, identifying mouth areas for speech production and limb areas for sign production. The temporal dynamics revealed superior parietal cortex activity immediately before sign production, suggesting its role in planning and producing sign language. Conclusions: Our findings reveal a distinct network for sign language and detail the temporal propagation supporting sign production

    Human Screams Occupy a Privileged Niche in the Communication Soundscape

    Get PDF
    Screaming is arguably one of the most relevant communication signals for survival in humans. Despite their practical relevance and their theoretical significance as innate [1] and virtually universal [2, 3] vocalizations, what makes screams a unique signal and how they are processed is not known. Here, we use acoustic analyses, psychophysical experiments, and neuroimaging to isolate those features that confer to screams their alarming nature, and we track their processing in the human brain. Using the modulation power spectrum (MPS [4, 5]), a recently developed, neurally informed characterization of sounds, we demonstrate that human screams cluster within restricted portion of the acoustic space (between ∼30 and 150 Hz modulation rates) that corresponds to a well-known perceptual attribute, roughness. In contrast to the received view that roughness is irrelevant for communication [6], our data reveal that the acoustic space occupied by the rough vocal regime is segregated from other signals, including speech, a pre-requisite to avoid false alarms in normal vocal communication. We show that roughness is present in natural alarm signals as well as in artificial alarms and that the presence of roughness in sounds boosts their detection in various tasks. Using fMRI, we show that acoustic roughness engages subcortical structures critical to rapidly appraise danger. Altogether, these data demonstrate that screams occupy a privileged acoustic niche that, being separated from other communication signals, ensures their biological and ultimately social efficiency
    corecore